Microsoft SQL Server 2008 R2 : Monitoring Replication (part 1) - Replication Monitoring SQL Statements

2/9/2012 9:16:06 AM

After replication is up and running, it is important for you to monitor it and see how things are running. You can do this in several ways, including using SQL statements, SQL Server Management Studio, and Windows Performance Monitor. You are interested in the agent’s successes and failures, the speed at which replication is done, and the synchronization state of tables involved in replication. Other issues to watch for are the sizes of the distribution database, growth of the subscriber databases, and available space on the distribution server’s snapshot working directory.

Replication Monitoring SQL Statements

One way to look at the replication configuration and validate row counts, for example, is to use various replication stored procedures, including the following:

sp_helppublication— Information on the publication server
sp_helparticle— Article definition information
sp_helpdistributor— Distributor information
sp_helpsubscriberinfo— Subscriber server information
sp_helpsubscription— Subscription information

These stored procedures are all extremely useful for verifying exactly how the replication configuration is really configured. If you execute these stored procedures (from the publication database), you get a great documentation of your complete replication topology that can be included in run books or other system documentation. Here’s what you might do to see how the current replication configuration has been built out:

use AdventureWorks2008
go
exec sp_helppublication
exec sp_helparticle @publication='PUBLISH AdventureWorks2008 - Transactional'
exec sp_helpdistributor
exec sp_helpsubscriberinfo
exec sp_helpsubscription
go

It yields this result:

1 PUBLISH AdventureWorks2008 - Transactional       0     1     1     0
----------------------------------------------------------------------
1  Address            [Person].[Address]             Address
2  AddressType        [Person].[AddressType]         AddressType
3  AWBuildVersion     [dbo].[AWBuildVersion]         AWBuildVersion
4  BillOfMaterials    [Production].[BillOfMaterials] BillOfMaterials
5  Contact            [Person].[Contact]             Contact
6  ContactCreditCard  [Sales].[ContactCreditCard]    ContactCreditCard
7  ContactType        [Person].[ContactType]         ContactType
...

In addition, sp_replcounters shows the activity of this replication session. You can see the volume of traffic and the throughput here:

exec sp_replcounters
go

It yields this result:

database repl_trans rate trans/sec latency (sec) etc.
AdventureWorks2008    0       1562.5    1.243

For actual row count validation, you can use sp_publication_validation, which goes through and checks the row counts of the publication and subscribers:

exec sp_publication_validation @publication
    = 'PUBLISH AdventureWorks2008 - Transactional'
go

It yields this result:

Generated expected rowcount value of 19614 for Address.
Generated expected rowcount value of 6 for AddressType.
Generated expected rowcount value of 1 for AWBuildVersion.
Generated expected rowcount value of 2679 for BillOfMaterials.
Generated expected rowcount value of 19972 for Contact.
Generated expected rowcount value of 19118 for ContactCreditCard.
Generated expected rowcount value of 20 for ContactType.
Generated expected rowcount value of 238 for CountryRegion.
Generated expected rowcount value of 109 for CountryRegionCurrency.
Generated expected rowcount value of 19118 for CreditCard.
Generated expected rowcount value of 8 for Culture.

Another way to monitor replication is to look at the actual data being replicated. To do this, you first run the SELECT count (*) FROM tblname statement against the table where data is being replicated. Then you verify directly whether the most current data available is in the database. If you make a change to the data in the published table, do the changes show up in the replicated tables? If not, you might need to investigate how replication was configured on the server.

If you are allowing updatable subscriptions, the replication queue comes into play. You need to learn all about the queueread command prompt utility. This utility configures and begins the queue reader agent, which reads messages stored in the SQL Server queue or a Microsoft message queue and applies those messages to the publisher.

To help you visualize how replication works, and to help you monitor replication, the following sample stored procedure, called REPL_ROWS_GENERATOR, takes one parameter (the number of rows [new customers in the Customer table] you want to have inserted at a time) and generates new rows in the Customer table that can reflect different data activity that will be published :

Use AdventureWorks2008
Go
--------------------------------------------------------
-- generate 500 new customers for replication testing --
--------------------------------------------------------
exec REPL_ROWS_GENERATOR 500
go

This example shows how to execute this stored procedure to insert 500 new customers. If you don’t supply any parameter, the default is 100 new customers. Try it out.

The following messages appear after you execute the REPL_ROWS_GENERATOR stored procedure:

INSERTING ROW: 1
INSERTING ROW: 2
INSERTING ROW: 3
INSERTING ROW: 4
INSERTING ROW: 5
INSERTING ROW: 6
INSERTING ROW: 7
INSERTING ROW: 8
INSERTING ROW: 9
INSERTING ROW: 10
...
INSERTING ROW: 500

Figure 1 shows this stored procedure.

Figure 1. The executing REPL_ROWS_GENERATOR stored procedure for testing data replication.

Monitoring Replication within SQL Server Management Studio

As you can imagine, SQL Server Management Studio provides considerable information about the status of replication. Most of this information is available via Replication Monitor. In Replication Monitor, you can see the activity for publishers, distributors, and subscribers; you can see all agent details; and you can configure alerts.

Through Replication Monitor, you also can invoke validation subscriptions processing to see if replication is in sync. You just navigate to the publication whose subscription you want to validate, right-click, and choose Validate Subscription option. This allows you to verify that the subscriber has the same number of rows of replicated data as the publisher. You can validate all subscriptions or just a particular one. Validation options are extensive and include using fast row count methods, actual row count methods, and even checksum comparisons of row data. This is a huge feature for SQL Server 2008. Figure 2 shows the results of running a complete subscription validation.

Figure 2. Validation of subscriptions via Replication Monitor.

Another great feature to help monitor replication is tracer tokens. Essentially, you create a marker (called a token) that flows through the full replication topology (from publisher to distributor to subscriber). It does not affect data tables! This flow is monitored and measured, down to the millisecond, and is for a specific publisher-to-subscriber path. Figure 3 shows the Tracer Tokens tab of the Replication Monitor and the Insert Tracer button that you can click to fire off the token through the topology. You can click this button to quickly see where bottlenecks exist (for example, from publisher to distributor, from distributor to subscriber) and the latency of the data flow along the way. In this example, it took the tracer token 2 seconds in total to traverse from the publisher to the distributor and to the subscriber.

Figure 3. Tracer tokens for monitoring data replication throughput.

Troubleshooting Replication Failures

Configuring replication and monitoring for successful replication is relatively easy. The fun begins when failures start arising. Replication Monitor pays for itself quickly. Red flags begin appearing to indicate agent failures. Depending on how you have the alerts defined, you probably also get numerous emails or pages.

The following are the most common issues you find with data replication:

Data row count inconsistencies, as discussed in the preceding section
Subscriber/publisher schema change failures
Connection failures
Agent failures

For the conventional replication situations, if the problem is with the validation of subscriptions processing, it is usually best to resynchronize the subscription by dropping it and resubscribing or by reinitializing the subscription.

Another common issue is that the SQL Server Agent service doesn’t start. Manually attempting to restart this service usually shakes things loose. Sometimes an object on the subscriber becomes messed up. The solution is usually to create that object again and reload its data via BCP or IS. Then you can resynchronize the subscription. In such a case, the subscription included this object originally, but it has become invalid in some way. With a heterogeneous subscriber, you often see connection errors due to invalid login IDs used in the ODBC connection. The quick fix is usually to just redefine the ODBC data source connection information.

A much more complex failure can arise when the replication queue is stopped due to some type of SQL language failure in the command being replicated. This situation is extremely serious because it stops all replication from continuing, and the distribution database starts growing rapidly. Replication keeps trying to execute, but it fails each time. This situation is essentially a permanent roadblock. The solution is to locate the exact transaction in the distribution database and delete it physically from the transaction queue. This action is highly unusual, but it is necessary when the circumstance presents itself. First, by looking at the error detail information in the distribution agent history, you can isolate the SQL statement on which it is choking. Then you have to find it in the distribution database. You start by executing the sp_browsereplcmds stored procedure from the distribution database. This gives you all the replication transactions (that is, each xact_seqno) along with the associated SQL command. You have to pump this to a text file for searching. You then search this data for the matching SQL command. When you locate it, you look for its associated transaction number (xact_seqno). You use this xact_seqno value to delete it from the Msrepl_commands table in the distribution database. This frees up the roadblock. You see this type of issue only about once every six months, if at all (it is hoped).